### Sarvam Research Fellowship

Assignment - Implement Einops from Scratch

---

Aditya Raj

\\

[Email](mailto:hexronus@gmail.com) - hexronus@gmail.com \\
[Portfolio](https://hexronus.vercel.app/) - https://hexronus.vercel.app \\
[LinkedIn](https://www.linkedin.com/in/hexronus/) - https://www.linkedin.com/in/hexronus/

**Task**

Summary:

We have to implement the `einops.rearrange` function from scratch, and it has to be called as,

```python
def rearrange(tensor: np.ndarray, pattern: str, **axes_lengths) -> np.ndarray:
```

More specifically, we have to implement,

*   Reshaping
*   Transposition
*   Splitting of axes
*   Merging of axes
*   Repeating of axes

With parsing, error logs and faster performance(using some process).



\\

---

\\



Proposed Approach

*   Numba with parallel execution for JIT-Compiler for small Numpy Values
*   C++([Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page)) for non-trivial indexing and reordering

**Current Implementation**

Supports Eigen(C++) for indexing and reordering, Numba is work in progress, with this the proposed work is 1.53 times faster than the einops library on average runtime.

# Performance Comparison: Custom Eigen-Based Rearrange vs. Einops

## Overview
 Evaluating a custom tensor rearrangement implementation, built using the Eigen library with a C++ backend and Pybind11 interface, against the popular `einops` library across 15 test cases. The custom solution outperforms `einops` in six tests, with speedups of **1.43x to 4.81x**, excelling in:
- **Basic 2D/3D transpositions** (e.g., 4.81x in Test 2)
- **Non-contiguous memory** (2.46x in Test 10)
- **Complex numbers** (1.43x in Test 11)

On **Average Runtime** for all 15 test cases, our custom model surpasses the original by `1.53x`.

Conversely, `einops` surpasses the custom implementation in nine tests, particularly in high-dimensional tensors and edge cases (e.g., 3.62x faster in Test 5 with empty dimensions). On average, the custom approach is **22.14% faster** (0.0000971s vs. 0.0001186s), highlighting its edge in specific scenarios.

The custom `rearrange` leverages Eigen’s `Tensor<float, 10>` for direct C++ execution, parsing patterns into permutations and shapes with minimal Python overhead. Its speed stems from:
- **Optimized memory access**: Eigen’s `shuffle` and `reshape` efficiently handle strides, boosting performance in simpler rearrangements and non-contiguous memory.
- **Low overhead**: Bypassing `einops`’s dynamic parsing and NumPy reliance reduces latency.

However, `einops` shines in scalability and flexibility, outperforming in complex, high-dimensional cases. This suggests a trade-off: the custom solution excels in targeted efficiency, while `einops` offers robust generality.

In [1]:
!gdown 1GpGQh7M_F5h21Qa1YBi4mDWYEdd3R4Ay

Downloading...
From: https://drive.google.com/uc?id=1GpGQh7M_F5h21Qa1YBi4mDWYEdd3R4Ay
To: /content/rearrange.zip
  0% 0.00/6.56M [00:00<?, ?B/s]100% 6.56M/6.56M [00:00<00:00, 65.0MB/s]100% 6.56M/6.56M [00:00<00:00, 64.8MB/s]


In [2]:
!unzip rearrange.zip -d .

Archive:  rearrange.zip
   creating: ./rearrange/
   creating: ./rearrange/build/
   creating: ./rearrange/build/lib.linux-x86_64-cpython-311/
   creating: ./rearrange/build/lib.linux-x86_64-cpython-311/rearrange/
  inflating: ./rearrange/build/lib.linux-x86_64-cpython-311/rearrange/eigen_backend.cpython-311-x86_64-linux-gnu.so  
   creating: ./rearrange/build/temp.linux-x86_64-cpython-311/
  inflating: ./rearrange/build/temp.linux-x86_64-cpython-311/eigen_backend.o  
  inflating: ./rearrange/core.py     
  inflating: ./rearrange/cuda_backend.py  
  inflating: ./rearrange/eigen_backend.cpp  
  inflating: ./rearrange/eigen_backend.cpython-311-x86_64-linux-gnu.so  
  inflating: ./rearrange/eigen_backend_wrapper.py  
  inflating: ./rearrange/numba_backend.py  
  inflating: ./rearrange/parser.py   
  inflating: ./rearrange/setup.py    
  inflating: ./rearrange/__init__.py  
   creating: ./rearrange/__pycache__/
  inflating: ./rearrange/__pycache__/core.cpython-311.pyc  
  inflating: ./rear

In [3]:
!pip install pybind11

Collecting pybind11
  Downloading pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/243.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.3/243.3 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pybind11
Successfully installed pybind11-2.13.6


In [4]:
!sudo apt-get install libeigen3-dev

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  libeigen3-doc libmpfrc++-dev
The following NEW packages will be installed:
  libeigen3-dev
0 upgraded, 1 newly installed, 0 to remove and 30 not upgraded.
Need to get 1,056 kB of archives.
After this operation, 9,081 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libeigen3-dev all 3.4.0-2ubuntu2 [1,056 kB]
Fetched 1,056 kB in 0s (7,526 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected p

In [5]:
%cd rearrange/

/content/rearrange


In [6]:
!python setup.py build_ext --inplace

running build_ext


if this does not run `!python setup.py build_ext --inplace`, delete `eigen_backend.cpython-311-x86_64-linux-gnu.so`, and run again

In [7]:
!ls

build		 eigen_backend.cpp				__init__.py	  __pycache__
core.py		 eigen_backend.cpython-311-x86_64-linux-gnu.so	numba_backend.py  setup.py
cuda_backend.py  eigen_backend_wrapper.py			parser.py


In [8]:
%cd ..

/content


In [11]:
#download the tests.py file
!gdown 1XbZFZJx7SxtwD3qAV1OSLsxeVIubEFjw

Downloading...
From (original): https://drive.google.com/uc?id=1XbZFZJx7SxtwD3qAV1OSLsxeVIubEFjw
From (redirected): https://drive.google.com/uc?id=1XbZFZJx7SxtwD3qAV1OSLsxeVIubEFjw&confirm=t&uuid=edb14666-4f11-4256-9efc-5133df2e6bfe
To: /content/tests.py
  0% 0.00/4.30k [00:00<?, ?B/s]100% 4.30k/4.30k [00:00<00:00, 9.81MB/s]


In [12]:
!python tests.py

platform linux -- Python 3.11.11, pytest-8.3.5, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content
plugins: typeguard-4.4.2, anyio-4.9.0, langsmith-0.3.22
[1mcollecting ... [0m[1mcollected 17 items                                                                                 [0m

tests.py::test_basic_group_split [32mPASSED[0m[32m                                                      [  5%][0m
tests.py::test_fully_specified_group [32mPASSED[0m[32m                                                  [ 11%][0m
tests.py::test_invalid_group_size [32mPASSED[0m[32m                                                     [ 17%][0m
tests.py::test_too_many_unknowns [32mPASSED[0m[32m                                                      [ 23%][0m
tests.py::test_ellipsis_basic [32mPASSED[0m[32m                                                         [ 29%][0m
tests.py::test_ellipsis_with_group [32mPASSED[0m[32m                                           