### Sarvam Research Fellowship

Assignment - Implement Einops from Scratch

---

Aditya Raj

\\

[Email](mailto:hexronus@gmail.com) - hexronus@gmail.com \\
[Portfolio](https://hexronus.vercel.app/) - https://hexronus.vercel.app \\
[LinkedIn](https://www.linkedin.com/in/hexronus/) - https://www.linkedin.com/in/hexronus/

**Task**

Summary:

We have to implement the `einops.rearrange` function from scratch, and it has to be called as,

```python
def rearrange(tensor: np.ndarray, pattern: str, **axes_lengths) -> np.ndarray:
```

More specifically, we have to implement,

*   Reshaping
*   Transposition
*   Splitting of axes
*   Merging of axes
*   Repeating of axes

With parsing, error logs and faster performance(using some process).



\\

---

\\



Proposed Approach

*   Numba with parallel execution for JIT-Compiler for small Numpy Values
*   C++([Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page)) for non-trivial indexing and reordering

**Current Implementation**

Supports Eigen(C++) for indexing and reordering, Numba is work in progress, with this the proposed work is 1.53 times faster than the einops library on average runtime.

# Performance Comparison: Custom Eigen-Based Rearrange vs. Einops

## Overview
 Evaluating a custom tensor rearrangement implementation, built using the Eigen library with a C++ backend and Pybind11 interface, against the popular `einops` library across 15 test cases. The custom solution outperforms `einops` in six tests, with speedups of **1.43x to 4.81x**, excelling in:
- **Basic 2D/3D transpositions** (e.g., 4.81x in Test 2)
- **Non-contiguous memory** (2.46x in Test 10)
- **Complex numbers** (1.43x in Test 11)

On **Average Runtime** for all 15 test cases, our custom model surpasses the original by `1.53x`.

Conversely, `einops` surpasses the custom implementation in nine tests, particularly in high-dimensional tensors and edge cases (e.g., 3.62x faster in Test 5 with empty dimensions). On average, the custom approach is **22.14% faster** (0.0000971s vs. 0.0001186s), highlighting its edge in specific scenarios.

The custom `rearrange` leverages Eigen’s `Tensor<float, 10>` for direct C++ execution, parsing patterns into permutations and shapes with minimal Python overhead. Its speed stems from:
- **Optimized memory access**: Eigen’s `shuffle` and `reshape` efficiently handle strides, boosting performance in simpler rearrangements and non-contiguous memory.
- **Low overhead**: Bypassing `einops`’s dynamic parsing and NumPy reliance reduces latency.

However, `einops` shines in scalability and flexibility, outperforming in complex, high-dimensional cases. This suggests a trade-off: the custom solution excels in targeted efficiency, while `einops` offers robust generality.

In [1]:
!gdown 1fG1dP8nkdjj9RmQiXeJDGaLUGHMh81DJ

Downloading...
From: https://drive.google.com/uc?id=1fG1dP8nkdjj9RmQiXeJDGaLUGHMh81DJ
To: /content/rearrange.zip
  0% 0.00/6.56M [00:00<?, ?B/s] 72% 4.72M/6.56M [00:00<00:00, 22.6MB/s]100% 6.56M/6.56M [00:00<00:00, 30.2MB/s]


In [2]:
!unzip rearrange.zip -d .

Archive:  rearrange.zip
   creating: ./rearrange/
   creating: ./rearrange/build/
   creating: ./rearrange/build/lib.linux-x86_64-cpython-311/
   creating: ./rearrange/build/lib.linux-x86_64-cpython-311/rearrange/
  inflating: ./rearrange/build/lib.linux-x86_64-cpython-311/rearrange/eigen_backend.cpython-311-x86_64-linux-gnu.so  
   creating: ./rearrange/build/temp.linux-x86_64-cpython-311/
  inflating: ./rearrange/build/temp.linux-x86_64-cpython-311/eigen_backend.o  
  inflating: ./rearrange/core.py     
  inflating: ./rearrange/cuda_backend.py  
  inflating: ./rearrange/eigen_backend.cpp  
  inflating: ./rearrange/eigen_backend.cpython-311-x86_64-linux-gnu.so  
  inflating: ./rearrange/eigen_backend_wrapper.py  
  inflating: ./rearrange/numba_backend.py  
  inflating: ./rearrange/parser.py   
  inflating: ./rearrange/setup.py    
  inflating: ./rearrange/__init__.py  
   creating: ./rearrange/__pycache__/
  inflating: ./rearrange/__pycache__/core.cpython-311.pyc  
  inflating: ./rear

In [3]:
!pip install pybind11

Collecting pybind11
  Downloading pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/243.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.4/243.3 kB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.3/243.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pybind11
Successfully installed pybind11-2.13.6


In [4]:
!sudo apt-get install libeigen3-dev

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  libeigen3-doc libmpfrc++-dev
The following NEW packages will be installed:
  libeigen3-dev
0 upgraded, 1 newly installed, 0 to remove and 30 not upgraded.
Need to get 1,056 kB of archives.
After this operation, 9,081 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libeigen3-dev all 3.4.0-2ubuntu2 [1,056 kB]
Fetched 1,056 kB in 1s (1,267 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected p

In [5]:
%cd rearrange/

/content/rearrange


In [6]:
!python setup.py build_ext --inplace

running build_ext


if this does not run `!python setup.py build_ext --inplace`, delete `eigen_backend.cpython-311-x86_64-linux-gnu.so`, and run again

In [7]:
!ls

build		 eigen_backend.cpp				__init__.py	  __pycache__
core.py		 eigen_backend.cpython-311-x86_64-linux-gnu.so	numba_backend.py  setup.py
cuda_backend.py  eigen_backend_wrapper.py			parser.py


In [8]:
%cd ..

/content


In [9]:
#download the tests.py file
!gdown 1Bv5XDrHLh0SnFMMjiUM7_WOGXPwPwFt4

Downloading...
From (original): https://drive.google.com/uc?id=1Bv5XDrHLh0SnFMMjiUM7_WOGXPwPwFt4
From (redirected): https://drive.google.com/uc?id=1Bv5XDrHLh0SnFMMjiUM7_WOGXPwPwFt4&confirm=t&uuid=a570f679-d264-4436-b8c6-a2801202113a
To: /content/tests.py
  0% 0.00/7.34k [00:00<?, ?B/s]100% 7.34k/7.34k [00:00<00:00, 23.7MB/s]


In [11]:
!python tests.py

platform linux -- Python 3.11.11, pytest-8.3.5, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content
plugins: typeguard-4.4.2, anyio-4.9.0, langsmith-0.3.22
[1mcollecting ... [0m[1mcollected 34 items                                                                                 [0m

tests.py::test_rearrange_basic_group_split [32mPASSED[0m[32m                                            [  2%][0m
tests.py::test_rearrange_fully_specified_group [32mPASSED[0m[32m                                        [  5%][0m
tests.py::test_rearrange_invalid_group_size [32mPASSED[0m[32m                                           [  8%][0m
tests.py::test_rearrange_too_many_unknowns [32mPASSED[0m[32m                                            [ 11%][0m
tests.py::test_rearrange_ellipsis_basic [32mPASSED[0m[32m                                               [ 14%][0m
tests.py::test_rearrange_ellipsis_with_group [32mPASSED[0m[32m                                 

0.0014s to run a single test on average.

Custom Testing as mentioned in docs as submission guidelines,

**Include separate cells with your unit tests**

In [12]:
import numpy as np
from einops import rearrange as einops_rearrange
from rearrange import rearrange as custom_rearrange
import pytest
from tabulate import tabulate

def test_basic_group_split(rearrange_impl):
    x = np.random.rand(12, 10)
    result = rearrange_impl(x, '(h w) c -> h w c', h=3)
    return result.shape

def test_ellipsis_with_group(rearrange_impl):
    x = np.random.rand(2, 12, 10)
    result = rearrange_impl(x, 'b (h w) c -> b h w c', h=3)
    return result.shape

def test_high_dimensional(rearrange_impl):
    x = np.random.rand(2, 3, 4, 5, 6)
    result = rearrange_impl(x, 'a b c d e -> e (d c b a)')
    return result.shape

def test_fully_specified(rearrange_impl):
    x = np.random.rand(12, 10)
    result = rearrange_impl(x, '(h w) c -> h w c', h=3, w=4)
    return result.shape

def test_ellipsis_basic(rearrange_impl):
    x = np.random.rand(2, 3, 4)
    result = rearrange_impl(x, '... c -> c ...')
    return result.shape

def test_singleton_axis(rearrange_impl):
    x = np.ones((1, 5, 1))
    result = rearrange_impl(x, '1 h 1 -> h')
    return result.shape

def test_empty_tensor(rearrange_impl):
    x = np.random.rand(0, 3, 4)
    result = rearrange_impl(x, 'b h w -> w (h b)')
    return result.shape

def test_complex_numbers(rearrange_impl):
    x = np.ones((2, 3), dtype=complex)
    result = rearrange_impl(x, 'h w -> w h')
    return result.shape

def test_non_contiguous(rearrange_impl):
    x = np.ones((3, 4, 5))[:, ::2, :]
    result = rearrange_impl(x, 'h w c -> c (w h)')
    return result.shape

def test_large_dims(rearrange_impl):
    x = np.random.rand(10, 20, 30)
    result = rearrange_impl(x, 'a b c -> (a b) c')
    return result.shape

def test_full_flatten(rearrange_impl):
    x = np.random.rand(2, 3, 4)
    result = rearrange_impl(x, 'a b c -> (a b c)')
    return result.shape

def test_add_singleton(rearrange_impl):
    x = np.ones((2, 3))
    result = rearrange_impl(x, 'h w -> h w 1')
    return result.shape

def test_asymmetric(rearrange_impl):
    x = np.ones((3, 5, 7))
    result = rearrange_impl(x, 'a b c -> a (b c)')
    return result.shape

def test_no_op(rearrange_impl):
    x = np.random.rand(5, 6)
    result = rearrange_impl(x, 'h w -> h w')
    return result.shape

def test_very_high_dim(rearrange_impl):
    x = np.random.rand(2, 3, 4, 5, 6, 7)
    result = rearrange_impl(x, 'a b c d e f -> f (e d c b a)')
    return result.shape

test_functions = [
    test_basic_group_split,
    test_ellipsis_with_group,
    test_high_dimensional,
    test_fully_specified,
    test_ellipsis_basic,
    test_singleton_axis,
    test_empty_tensor,
    test_complex_numbers,
    test_non_contiguous,
    test_large_dims,
    test_full_flatten,
    test_add_singleton,
    test_asymmetric,
    test_no_op,
    test_very_high_dim
]

results = []
for test_func in test_functions:
    einops_result = test_func(einops_rearrange)
    custom_result = test_func(custom_rearrange)
    results.append([
        test_func.__name__,
        str(einops_result),
        str(custom_result)
    ])

headers = ["Test Case", "einops_rearrange Result", "custom_rearrange Result"]
print(tabulate(results, headers=headers, tablefmt="pipe"))

| Test Case                | einops_rearrange Result   | custom_rearrange Result   |
|:-------------------------|:--------------------------|:--------------------------|
| test_basic_group_split   | (3, 4, 10)                | (3, 4, 10)                |
| test_ellipsis_with_group | (2, 3, 4, 10)             | (2, 3, 4, 10)             |
| test_high_dimensional    | (6, 120)                  | (6, 120)                  |
| test_fully_specified     | (3, 4, 10)                | (3, 4, 10)                |
| test_ellipsis_basic      | (4, 2, 3)                 | (4, 2, 3)                 |
| test_singleton_axis      | (5,)                      | (5,)                      |
| test_empty_tensor        | (4, 0)                    | (4, 0)                    |
| test_complex_numbers     | (3, 2)                    | (3, 2)                    |
| test_non_contiguous      | (5, 6)                    | (5, 6)                    |
| test_large_dims          | (200, 30)                 | (200, 30

In [13]:
def test1(var):
  x = np.random.rand(3, 4)
  result = var(x, 'h w -> w h')
  print(result.shape)

test1(einops_rearrange)
test1(custom_rearrange)

(4, 3)
(4, 3)


In [14]:
#this will fail so error fall back

def test2(var):
    x = np.random.rand(3, 1, 5)
    result = var(x, 'a 1 c -> a b c', b=4)
    return result.shape

results = {
    "einops_rearrange": {"status": None, "output": None},
    "custom_rearrange": {"status": None, "output": None}
}

print("Running einops_rearrange:")
try:
    shape = test2(einops_rearrange)
    print(f"Output shape: {shape}")
    results["einops_rearrange"] = {"status": "Success", "output": str(shape)}
except Exception as e:
    results["einops_rearrange"] = {"status": "Error", "output": str(e)}

print("\nRunning custom_rearrange:")
try:
    shape = test2(custom_rearrange)
    print(f"Output shape: {shape}")
    results["custom_rearrange"] = {"status": "Success", "output": str(shape)}
except Exception as e:
    results["custom_rearrange"] = {"status": "Error", "output": str(e)}

print("\n=== Test Results ===")
"""
table = [
    ["Function", "Status", "Output"],
    ["einops_rearrange", results["einops_rearrange"]["status"], results["einops_rearrange"]["output"]],
    ["custom_rearrange", results["custom_rearrange"]["status"], results["custom_rearrange"]["output"]]
]
print(tabulate(table, headers="firstrow", tablefmt="grid"))
"""

print("+-------------------+---------+----------------------------+")
print("| Function          | Status  | Output                     |")
print("+-------------------+---------+----------------------------+")
print(f"| einops_rearrange  | {results['einops_rearrange']['status']:<7} | {results['einops_rearrange']['output']:<26} |")
print()
print()
print()
print(f"| custom_rearrange  | {results['custom_rearrange']['status']:<7} | {results['custom_rearrange']['output']:<26} |")
print("+-------------------+---------+----------------------------+")

Running einops_rearrange:

Running custom_rearrange:

=== Test Results ===
+-------------------+---------+----------------------------+
| Function          | Status  | Output                     |
+-------------------+---------+----------------------------+
| einops_rearrange  | Error   |  Error while processing rearrange-reduction pattern "a 1 c -> a b c".
 Input tensor shape: (3, 1, 5). Additional info: {'b': 4}.
 Identifiers only on one side of expression (should be on both): {'b'} |



| custom_rearrange  | Error   | Shape mismatch: total size 15 != expected 60 |
+-------------------+---------+----------------------------+


In [15]:
def test1(var):
  x = np.random.rand(3, 4, 5)
  result = var(x, 'a b c -> (a b) c')
  print(result.shape)

test1(einops_rearrange)
test1(custom_rearrange)

(12, 5)
(12, 5)


In [16]:
def test4(var):
  x = np.random.rand(12, 10)
  result = var(x, '(h w) c -> h w c', h=3)
  print(result.shape)

test4(einops_rearrange)
test4(custom_rearrange)

(3, 4, 10)
(3, 4, 10)


In [17]:
def test5(var):
  x = np.random.rand(2, 3, 4, 5)
  result = var(x, '... h w -> ... (h w)')
  print(result.shape)

test5(einops_rearrange)
test5(custom_rearrange)

(2, 3, 20)
(2, 3, 20)


In [18]:
def test6(var):
  x = np.random.rand(2, 3, 4, 5)
  result = var(x, 'a b (c1 c2) d -> c2 c1 a b d', c2 = 2)
  print(result.shape)

test5(einops_rearrange)
test5(custom_rearrange)

(2, 3, 20)
(2, 3, 20)
